feat(experimentation): environment-scoped metrics & experiment results by gagantrivedi · Pull Request #7674 · Flagsmith/flagsmith

gagantrivedi · 2026-06-02T09:50:11Z

What

Adds a reusable, environment-scoped Metric and wires metrics into experiments end to end, ClickHouse-native. Builds on the existing experimentation app (Experiment, WarehouseConnection).

Data model

Metric — environment-scoped, soft-delete. metric_type (numeric/conversion), aggregation (count/sum/mean), and a JSON definition (the recipe: event + optional filters/value/window). Immutable for now (no update endpoint).
ExperimentMetric — attaches a metric to an experiment with an expected_direction; unique per (experiment, metric).
MetricResultSnapshot — freezes computed results once an experiment completes.
Experiment gains exposure_event (default $flag_exposure) and control_variant.

API (gated on `EXPERIMENT_FLAG` + environment admin)

…/environments/{key}/experiment-metrics/ — metric library: list / create / retrieve / delete. (Not metrics/ — that path is taken by the usage-metrics viewset.) Deletion is blocked while attached to an active experiment.
…/experiments/{id}/metrics/ — attach / list / detach, with same-environment + unique-attach validation.
…/experiments/{id}/results/ — per-metric per-variant n/mean/variance, relative lift, confidence interval, and a per-metric verdict. Cached to a snapshot once completed.

Results engine

query.py builds the assignment (argMin first-touch on $flag_exposure.value) + metric CTEs from a metric definition. Untrusted values are bound params; LEFT JOIN … coalesce(…,0) keeps assigned-but-inactive identities as real zeros.
stats.py compares variants with a Welch/z two-sample test (CI included; for a 0/1 conversion column this reduces to a two-proportion z-test).

Scope notes (intentional cuts for v1)

Primary metrics only — no role/secondary/guardrail concept yet.
Metrics are immutable — no edit endpoint.
No metric validation dry-run yet.
Numeric count/sum/mean dedupe on a natural key to blunt at-least-once Firehose duplicates; residual collision risk documented in query.py. A per-event id in the ingest stream is the clean long-term fix.

Testing

47 new unit tests (models, metric CRUD, attach/detach, SQL builder, stats, results, snapshot).
Full experimentation suite green (191 passed); mypy clean; migrations complete.

🤖 Generated with Claude Code

vercel · 2026-06-02T09:50:18Z

The latest updates on your projects. Learn more about Vercel for GitHub.

3 Skipped Deployments

Project	Deployment	Actions	Updated (UTC)
docs	Ignored	Preview	Jun 3, 2026 7:21am
flagsmith-frontend-preview	Ignored	Preview	Jun 3, 2026 7:21am
flagsmith-frontend-staging	Ignored	Preview	Jun 3, 2026 7:21am

github-actions · 2026-06-02T09:51:19Z

Docker builds report

Image	Build Status	Security report
`ghcr.io/flagsmith/flagsmith-e2e:pr-7674`	Finished ✅	Skipped
`ghcr.io/flagsmith/flagsmith-api-test:pr-7674`	Finished ✅	Skipped
`ghcr.io/flagsmith/flagsmith-api:pr-7674`	Finished ✅	Results ✅
`ghcr.io/flagsmith/flagsmith:pr-7674`	Finished ✅	Results ✅
`ghcr.io/flagsmith/flagsmith-private-cloud:pr-7674`	Finished ✅	Results ✅
`ghcr.io/flagsmith/flagsmith-frontend:pr-7674`	Finished ✅	Results ✅

codecov · 2026-06-02T09:55:58Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.53%. Comparing base (9bdf0f2) to head (c961486).
⚠️ Report is 8 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff            @@
##             main    #7674    +/-   ##
========================================
  Coverage   98.52%   98.53%            
========================================
  Files        1444     1449     +5     
  Lines       55083    55276   +193     
========================================
+ Hits        54273    54466   +193     
  Misses        810      810

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

github-actions · 2026-06-02T09:57:18Z

Playwright Test Results (oss - depot-ubuntu-latest-16)

1 passed

Details

1 test across 1 suite
33 seconds
86ccf2a
🔄 Run: #17168 (attempt 1)

Playwright Test Results (oss - depot-ubuntu-latest-arm-16)

1 passed

Details

1 test across 1 suite
35.1 seconds
86ccf2a
🔄 Run: #17168 (attempt 1)

Playwright Test Results (private-cloud - depot-ubuntu-latest-arm-16)

1 passed

Details

1 test across 1 suite
39.8 seconds
86ccf2a
🔄 Run: #17168 (attempt 1)

Playwright Test Results (private-cloud - depot-ubuntu-latest-16)

1 passed

Details

1 test across 1 suite
52.5 seconds
86ccf2a
🔄 Run: #17168 (attempt 1)

github-actions · 2026-06-02T09:59:28Z

Visual Regression

19 screenshots compared. See report for details.
View full report

… attachment Add a reusable, environment-scoped Metric and the ExperimentMetric join that attaches metrics to experiments. - Models: Metric (numeric; count/sum/mean/occurrence aggregations + JSON definition), ExperimentMetric (expected_direction; one attach per experiment+metric); Experiment gains exposure_event ($flag_exposure) and control_variant. - Metric library CRUD under environments/{key}/experiment-metrics/, gated on EXPERIMENT_FLAG + environment admin. Metrics are immutable for now (no update); deletion blocked while attached to an active experiment. - Attach/detach metrics under an experiment, with same-environment and unique-attach validation. Results computation (ClickHouse query builder + statistics) is intentionally kept on a separate branch; this branch is models + API only.

Zaimwa9 · 2026-06-03T07:54:53Z

+    expected_direction = models.CharField(
+        max_length=20,
+        choices=ExpectedDirection.choices,
+    )


Interesting. So you see it in the relation Experiment x Metrics. I could see it too there although i'm wondering if that's a reality.

Let's say we have those metrics:

Conversion rate (up)

Average basket (up)

Time to activation (down)

First time page render (down)

Is there a world in which we'd want an experiment to push it in the other direction ? If not i'd stick it to the metrics and maybe have the possibility to override it in an experiment (in v2)

Ok I think i'm actually mixing 2 things:

the metric polarity (is it better up or is it better down as per what it is) -> I think we should also add this one

the experiment impact (it should go up, it should keep it same level, it should impact it down) especially as a guardrail => expected_direction that we should keep

gagantrivedi · 2026-06-03T08:09:27Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces Metric and ExperimentMetric models, along with corresponding API endpoints, serializers, permissions, audit logging, and unit tests to support experiment metrics. The review feedback highlights several critical improvements for robustness and business logic validation: using get_object_or_404 to handle missing or soft-deleted experiments cleanly, excluding soft-deleted experiments when checking for active metric attachments, adding defensive validation in ExperimentMetricSerializer (such as preventing modifications to completed experiments or attaching deleted metrics), and restricting the detachment of metrics from completed experiments.

gemini-code-assist · 2026-06-03T08:11:20Z

+    def _get_experiment(self) -> Experiment:
+        experiment: Experiment = Experiment.objects.get(
+            id=self.kwargs[self.experiment_url_kwarg],
+            environment__api_key=self.kwargs["environment_api_key"],
+        )
+        return experiment


Using Experiment.objects.get directly in the view helper will raise Experiment.DoesNotExist if the experiment is not found, resulting in an unhandled 500 Internal Server Error. Additionally, since the soft-delete manager may include soft-deleted rows on direct PK lookups, we should explicitly ensure we only retrieve active (non-deleted) experiments. Using get_object_or_404 with deleted_at__isnull=True ensures a clean 404 Not Found response is returned.

Suggested change

def _get_experiment(self) -> Experiment:

experiment: Experiment = Experiment.objects.get(

id=self.kwargs[self.experiment_url_kwarg],

environment__api_key=self.kwargs["environment_api_key"],

)

return experiment

def _get_experiment(self) -> Experiment:

from django.shortcuts import get_object_or_404

return get_object_or_404(

Experiment,

id=self.kwargs[self.experiment_url_kwarg],

environment__api_key=self.kwargs["environment_api_key"],

deleted_at__isnull=True,

)

gemini-code-assist · 2026-06-03T08:11:20Z

+        if (
+            ExperimentMetric.objects.filter(metric=instance)
+            .exclude(experiment__status=ExperimentStatus.COMPLETED)
+            .exists()
+        ):


When checking if a metric is attached to an active experiment, we must exclude soft-deleted experiments. Since ExperimentMetric is a standard model and does not automatically get deleted when an Experiment is soft-deleted, 'ghost' attachments to soft-deleted experiments will permanently block the deletion of the metric. Adding experiment__deleted_at__isnull=True prevents this issue.

if ( ExperimentMetric.objects.filter( metric=instance, experiment__deleted_at__isnull=True, ) .exclude(experiment__status=ExperimentStatus.COMPLETED) .exists() ):

gemini-code-assist · 2026-06-03T08:11:20Z

+    def validate(self, attrs: dict[str, Any]) -> dict[str, Any]:
+        experiment: Experiment = self.context["experiment"]
+        metric: Metric = attrs.get("metric", getattr(self.instance, "metric", None))
+
+        if metric.environment_id != experiment.environment_id:
+            raise serializers.ValidationError(
+                {"metric": "Metric must belong to the experiment's environment."}
+            )
+
+        attached = experiment.experiment_metrics.all()
+        if isinstance(self.instance, ExperimentMetric):
+            attached = attached.exclude(pk=self.instance.pk)
+
+        if "metric" in attrs and attached.filter(metric=metric).exists():
+            raise serializers.ValidationError(
+                {"metric": "Metric is already attached to this experiment."}
+            )
+        return attrs


We should add defensive checks and business logic validation to ExperimentMetricSerializer.validate:

Prevent attaching or modifying metrics of a completed experiment (experiment.status == 'completed').

Ensure metric is not None to avoid potential AttributeError on metric.environment_id.

Prevent attaching a soft-deleted metric (metric.deleted_at is not None).

def validate(self, attrs: dict[str, Any]) -> dict[str, Any]: experiment: Experiment = self.context["experiment"] if experiment.status == "completed": raise serializers.ValidationError( "Cannot modify metrics of a completed experiment." ) metric: Metric = attrs.get("metric", getattr(self.instance, "metric", None)) if not metric: raise serializers.ValidationError({"metric": "Metric is required."}) if metric.deleted_at is not None: raise serializers.ValidationError({"metric": "Cannot attach a deleted metric."}) if metric.environment_id != experiment.environment_id: raise serializers.ValidationError( {"metric": "Metric must belong to the experiment's environment."} ) attached = experiment.experiment_metrics.all() if isinstance(self.instance, ExperimentMetric): attached = attached.exclude(pk=self.instance.pk) if "metric" in attrs and attached.filter(metric=metric).exists(): raise serializers.ValidationError( {"metric": "Metric is already attached to this experiment."} ) return attrs

gemini-code-assist · 2026-06-03T08:11:20Z

+    def perform_create(self, serializer: BaseSerializer[ExperimentMetric]) -> None:
+        serializer.save(experiment=self._get_experiment())


We should prevent detaching metrics from a completed experiment. Overriding the destroy method in ExperimentMetricViewSet to check the experiment's status ensures the integrity of completed experiment results.

Suggested change

def perform_create(self, serializer: BaseSerializer[ExperimentMetric]) -> None:

serializer.save(experiment=self._get_experiment())

def perform_create(self, serializer: BaseSerializer[ExperimentMetric]) -> None:

serializer.save(experiment=self._get_experiment())

def destroy(self, request: Request, *args: object, **kwargs: object) -> Response:

experiment = self._get_experiment()

if experiment.status == ExperimentStatus.COMPLETED:

return Response(

{"detail": "Cannot detach metrics from a completed experiment."},

status=status.HTTP_400_BAD_REQUEST,

)

return super().destroy(request, *args, **kwargs)

Copilot

Pull request overview

Adds a first-pass “metric library” for experimentation by introducing environment-scoped Metric objects and an ExperimentMetric join model, then exposing CRUD / attach / detach APIs under environment + experiment routes with auditing and permissions.

Changes:

Add Metric + ExperimentMetric models (with migration) and extend audit related-object types.
Introduce metric library endpoints (/experiment-metrics/) and experiment metric attachment endpoints (/experiments/{id}/metrics/) with permissions, serializers, and audit logs.
Add unit tests covering metric CRUD, immutability expectations, attach/detach flows, and basic validation.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
api/tests/unit/experimentation/test_metric_views.py	Adds API tests for environment-scoped metric library CRUD and permission / flag gating.
api/tests/unit/experimentation/test_metric_models.py	Adds model-level tests for defaults, uniqueness, and soft-delete behavior.
api/tests/unit/experimentation/test_experiment_metric_views.py	Adds API tests for attaching, listing, detaching, and updating experiment-metric relationships.
api/experimentation/views.py	Adds `MetricViewSet` and `ExperimentMetricViewSet` and wires metric audit logging + delete-guard logic.
api/experimentation/services.py	Adds `create_metric_audit_log` and reuses existing feature-flag helpers.
api/experimentation/serializers.py	Adds `MetricSerializer` and `ExperimentMetricSerializer` with definition + attachment validation.
api/experimentation/permissions.py	Adds `MetricPermission` to gate metric library endpoints on experiment flag + env admin.
api/experimentation/models.py	Introduces `MetricAggregation`, `ExpectedDirection`, `Metric`, and `ExperimentMetric`.
api/experimentation/migrations/0005_metrics.py	Creates DB tables for `Metric` and `ExperimentMetric`.
api/experimentation/metric_urls.py	Registers the metric library router under the environment.
api/experimentation/experiment_urls.py	Adds nested routes for `/experiments/{id}/metrics/` using nested routers.
api/environments/urls.py	Includes the new experiment-metrics URL module under environments.
api/audit/related_object_type.py	Adds `METRIC` related object type for audit logs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        if (
+            ExperimentMetric.objects.filter(metric=instance)
+            .exclude(experiment__status=ExperimentStatus.COMPLETED)
+            .exists()
+        ):


+    def _get_experiment(self) -> Experiment:
+        experiment: Experiment = Experiment.objects.get(
+            id=self.kwargs[self.experiment_url_kwarg],
+            environment__api_key=self.kwargs["environment_api_key"],
+        )
+        return experiment


+class ExperimentMetricSerializer(serializers.ModelSerializer):  # type: ignore[type-arg]
+    metric = serializers.PrimaryKeyRelatedField(  # type: ignore[var-annotated]
+        queryset=Metric.objects.all(),
+    )
+    metric_name = serializers.CharField(source="metric.name", read_only=True)
+    aggregation = serializers.CharField(source="metric.aggregation", read_only=True)
+


gagantrivedi requested review from a team as code owners June 2, 2026 09:50

gagantrivedi requested review from emyller and removed request for a team June 2, 2026 09:50

flagsmith-engineering Bot assigned emyller Jun 2, 2026

github-actions Bot added api Issue related to the REST API infrastructure feature New feature or request and removed infrastructure labels Jun 2, 2026

gagantrivedi marked this pull request as draft June 2, 2026 09:53

gagantrivedi removed the request for review from emyller June 2, 2026 09:53

gagantrivedi assigned emyller and unassigned emyller Jun 2, 2026

gagantrivedi force-pushed the feat/experiment-metrics branch from 86ccf2a to 435d91f Compare June 2, 2026 10:41

github-actions Bot added feature New feature or request and removed feature New feature or request labels Jun 2, 2026

gagantrivedi force-pushed the feat/experiment-metrics branch from 435d91f to 9568226 Compare June 2, 2026 11:01

github-actions Bot added feature New feature or request and removed feature New feature or request labels Jun 2, 2026

gagantrivedi force-pushed the feat/experiment-metrics branch from 9568226 to 2fea3fa Compare June 3, 2026 07:00

github-actions Bot added feature New feature or request and removed feature New feature or request labels Jun 3, 2026

gagantrivedi force-pushed the feat/experiment-metrics branch from 2fea3fa to c961486 Compare June 3, 2026 07:21

github-actions Bot added feature New feature or request and removed feature New feature or request labels Jun 3, 2026

Zaimwa9 reviewed Jun 3, 2026

View reviewed changes

gagantrivedi requested a review from Copilot June 3, 2026 08:09

Copilot started reviewing on behalf of gagantrivedi June 3, 2026 08:09 View session

gemini-code-assist Bot reviewed Jun 3, 2026

View reviewed changes

Copilot AI reviewed Jun 3, 2026

View reviewed changes

		def perform_create(self, serializer: BaseSerializer[ExperimentMetric]) -> None:
		serializer.save(experiment=self._get_experiment())

-    def perform_create(self, serializer: BaseSerializer[ExperimentMetric]) -> None:
-        serializer.save(experiment=self._get_experiment())
+    def perform_create(self, serializer: BaseSerializer[ExperimentMetric]) -> None:
+        serializer.save(experiment=self._get_experiment())
+    def destroy(self, request: Request, *args: object, **kwargs: object) -> Response:
+        experiment = self._get_experiment()
+        if experiment.status == ExperimentStatus.COMPLETED:
+            return Response(
+                {"detail": "Cannot detach metrics from a completed experiment."},
+                status=status.HTTP_400_BAD_REQUEST,
+            )
+        return super().destroy(request, *args, **kwargs)

Conversation

gagantrivedi commented Jun 2, 2026

What

Data model

API (gated on EXPERIMENT_FLAG + environment admin)

Results engine

Scope notes (intentional cuts for v1)

Testing

Uh oh!

vercel Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Docker builds report

Uh oh!

codecov Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Playwright Test Results (oss - depot-ubuntu-latest-16)

Details

Playwright Test Results (oss - depot-ubuntu-latest-arm-16)

Details

Playwright Test Results (private-cloud - depot-ubuntu-latest-arm-16)

Details

Playwright Test Results (private-cloud - depot-ubuntu-latest-16)

Details

Uh oh!

github-actions Bot commented Jun 2, 2026

Visual Regression

Uh oh!

Zaimwa9 Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

Zaimwa9 Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

gagantrivedi commented Jun 3, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

API (gated on `EXPERIMENT_FLAG` + environment admin)

vercel Bot commented Jun 2, 2026 •

edited

Loading

github-actions Bot commented Jun 2, 2026 •

edited

Loading

codecov Bot commented Jun 2, 2026 •

edited

Loading

github-actions Bot commented Jun 2, 2026 •

edited

Loading